منابع مشابه
Data-Efficient Policy Evaluation Through Behavior Policy Search
We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show that the data collected from deploying a different policy, commonly called the behavior policy, can be used to produce unbiased estimates with lower mean squared error than this standard technique. W...
متن کاملData-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have ...
متن کاملorigins of armenia’s foreign policy and its foreign policy towards iran
foreign policy takes root from complicated matters. however, this issue may be more truth about armenia. although the new government of armenia is less than 20 years, people of this territory are the first ones who officially accepted christianity. in very past times, these people were a part of great emperors like iran, rome, and byzantium.armenia is regarded as a nation with a privileged hist...
15 صفحه اولStatistics & Clustering Based Framework for Efficient XACML Policy Evaluation
The adoption of XACML as the standard for specifying access control policies for various applications, especially web services is vastly increasing. A policy evaluation engine can easily become a bottleneck when enforcing large policies. In this paper we propose an adaptive approach for XACML policy optimization. We proposed a clustering technique that categorizes policies and rules within a po...
متن کاملSample-efficient Nonstationary Policy Evaluation for Contextual Bandits
We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Neural Networks and Learning Systems
سال: 2019
ISSN: 2162-237X,2162-2388
DOI: 10.1109/tnnls.2018.2871361